Open science is about making the methods, data and outcomes in your analysis available to everyone. It includes:
In this tutorial, you are not going to learn all aspects of open science as listed above. However, you will learn one tool that can be used to make your workflows:
You will learn how to document your work - by connecting data, methods and outputs in one or more reports or documents. You will learn the R Markdown file format which can be used to generate reports that connect your data, code (methods used to process the data) and outputs. You will use the rmarkdown and knitr package to write R Markdown files in Rstudio and publish them in different formats (html, pdf, etc).
Simply put, .Rmd is a text based file format that allows
you to include both descriptive text, code blocks and code output. You
can run the code in R using a package called knitr (which you will learn
about next). You can export the text formated .Rmd file to a nicely
rendered, shareable format like pdf or html. When you knit (or use
knitr), the accompanying code is executed, resulting the outputs
(e.g. plots, and other figures) appearing in the rendered document.
R Markdown (.Rmd) is an authoring format that enables easy creation of dynamic documents, presentations, and reports from R. It combines the core syntax of markdown (an easy to write plain text format) with embedded R code chunks that are run so their output can be included in the final document. R Markdown documents are fully reproducible (they can be automatically regenerated whenever underlying R code or data changes).“ RStudio documentation.
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunks in your knitr markdown using:
There are also several option that you can add to this fucntion
{r} to change how your code runs
(e.g. {r, include=FALSE}).
Now let’s learn additional basics that you can use for creating your markdown documents.
Plain text
End a line with two spaces
to start a new paragraph.
italics and bold
verbatim code
sub/superscript22
strikethrough
escaped: * _ \
endash: –, emdash: —
equation: \(A = \pi*r^{2}\)
equation block: \[E = mc^{2}\]
block quote
HTML ignored in pdfs
Jump to Header 1
image:
unordered list
sub-item 1
sub-item 2
sub-sub-item 1
item 2 Continued (indent 4 spaces)
Term 1: Definition 1
| Right | Left | Default | Center |
|---|---|---|---|
| 12 | 12 | 12 | 12 |
| 123 | 123 | 123 | 123 |
| 1 | 1 | 1 | 1 |
horizontal rule/slide break: *** A footnote [^1] [^1]: Here is the footnote.
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.
At this point we might want to commit these changes to your version control system (Git) and push them to GitHub. Follow these steps to maintain a clear version history and ensure your work is backed up remotely.
Create a GitHub account (if you don’t already have one) at github.com.
Once signed in, click the New button on GitHub’s homepage to create a repository.
Copy the repository URL from GitHub by clicking the Code button and copying the URL.
Download Git from the Git website and install it on your system.
If this is your first time using Git with GitHub and R Studio, you may need to configure your credentials. Generate a personal access token by following these instructions from GitHub.
In RStudio, go to File > New Project.
Select Version Control > Git to initialize the project as a Git repository.
Enter the GitHub repository URL you copied earlier, and choose a local folder to clone the repository.
If you have an existing project with Git enabled, select Version Control > Git in RStudio and provide the repository URL from GitHub.
Once you clone the repository, you will see a Git tab in RStudio. This tab will help you manage your version control activities, including staging, committing, and pushing changes.
Make changes to your R scripts or R Markdown files.
Go to the Git tab in RStudio (located next to the Environment and History tabs).
Stage your changes:
Write a commit message:
"Added initial data analysis").Click Commit to save the changes locally.
library(ggplot2)
library(sf)
## Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ lubridate 1.9.4 ✔ tibble 3.3.0
## ✔ purrr 1.1.0 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
#getwd()
boulder <- st_read("/Users/paigelund/Desktop/EAS_548/advanced_geovisualization_week_two/BoulderSocialMedia/BoulderSocialMedia.shp")
## Reading layer `BoulderSocialMedia' from data source
## `/Users/paigelund/Desktop/EAS_548/advanced_geovisualization_week_two/BoulderSocialMedia/BoulderSocialMedia.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 55519 features and 12 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -788775 ymin: 1917813 xmax: -780555 ymax: 1930053
## Projected CRS: NAD_1983_Albers
boulder
## Simple feature collection with 55519 features and 12 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -788775 ymin: 1917813 xmax: -780555 ymax: 1930053
## Projected CRS: NAD_1983_Albers
## First 10 features:
## id DB extent Climb_dist TrailH_Dis NatMrk_Dis Trails_dis
## 1 6517284333 Flickr 421678.2 1973.108 2368.567 2451.633 49.73422
## 2 6517281191 Flickr 421678.2 1973.108 2368.567 2451.633 49.73422
## 3 6517278961 Flickr 421678.2 1973.108 2368.567 2451.633 49.73422
## 4 6517276295 Flickr 421678.2 1973.108 2368.567 2451.633 49.73422
## 5 6517274727 Flickr 421678.2 1973.108 2368.567 2451.633 49.73422
## 6 6517272539 Flickr 421678.2 1973.108 2368.567 2451.633 49.73422
## 7 6517270109 Flickr 421678.2 1973.108 2368.567 2451.633 49.73422
## 8 6516904527 Flickr 421678.2 1973.108 2368.567 2451.633 49.73422
## 9 6516902971 Flickr 421678.2 1973.108 2368.567 2451.633 49.73422
## 10 6516900761 Flickr 421678.2 1973.108 2368.567 2451.633 49.73422
## Bike_dis PrarDg_Dis PT_Elev Hydro_dis Street_dis geometry
## 1 1437.134 1942.125 2064 1359.75 193.9165 POINT (-786099 1929916)
## 2 1437.134 1942.125 2064 1359.75 193.9165 POINT (-786099 1929916)
## 3 1437.134 1942.125 2064 1359.75 193.9165 POINT (-786099 1929916)
## 4 1437.134 1942.125 2064 1359.75 193.9165 POINT (-786099 1929916)
## 5 1437.134 1942.125 2064 1359.75 193.9165 POINT (-786099 1929916)
## 6 1437.134 1942.125 2064 1359.75 193.9165 POINT (-786099 1929916)
## 7 1437.134 1942.125 2064 1359.75 193.9165 POINT (-786099 1929916)
## 8 1437.134 1942.125 2064 1359.75 193.9165 POINT (-786099 1929916)
## 9 1437.134 1942.125 2064 1359.75 193.9165 POINT (-786099 1929916)
## 10 1437.134 1942.125 2064 1359.75 193.9165 POINT (-786099 1929916)
This Here are the details of data:
| Variable | Description |
|---|---|
| DB | indicates whether the point is a social media location (Flickr or Panramio) or a point in the park |
| extent | extent that can be viewed at each point estimated through viewshed analysis |
| Climb_dist | distance to nearest climbing wall |
| TrailH_Dis | distance to hiking trails |
| NatMrk_Dis | distance to natural landmark |
| Trails_dis | distance to walking trails |
| Bike_dis | distance to biking trails |
| PrarDg_Dis | distance to prairie dog mounds |
| PT_Elev | Elevation |
| Hydro_dis | distance to lakes, rivers and creeks |
| Street_dis | distance to streets and parking lots |
geom_sf function. The
different arguments control the object attributes(this can be points,
lines or polygons). For example, fill= control the color of
object outline. alpha = controls the opacity of the object.
The final argument is a complete theme, which controls the non-data
display(e.g. neatlines, gradicule title). More details can be found
regarding these [themes] here(https://ggplot2.tidyverse.org/reference/ggtheme.html).
Here we use theme_bw, which is the black and white theme.
You can try other themes to explore the different options.ggplot() +
geom_sf(data =boulder,
fill = NA, alpha = .2) +
theme_bw()
boulder = st_transform(boulder, 26753)
ggplot() +
geom_sf(data =boulder,
fill = NA, alpha = .2) +
theme_bw()
ggplot() +
geom_sf(data =boulder, aes(color=PT_Elev),
fill = NA, alpha = .2) +
theme_bw()
ggplot2 has several gradient colour scale options. The
details can be found here.ggplot() +
geom_sf(data =boulder, aes(color=PT_Elev),
fill = NA, alpha = .2) +
scale_colour_gradientn(colours = terrain.colors(10)) +
theme_bw()
ifelse() function. The function basically means
if the first argument is true (PT_Elev >= 2200), the elevation is
greater than 2200 meter, then print the first varible: TRUE; if not
true, print the second varible: FALSE. We use the mutate fucntion to
make a new variable in our boulder dataframe. We then use ggplot to plot
these locations.#library(dplyer)
boulder %>%
mutate(high_elev = ifelse(PT_Elev >= 2200, TRUE, FALSE))%>%
ggplot() +
geom_sf(aes(color=high_elev),
fill = NA, alpha = .2) +
theme_bw()
filter() to analyze social media only. We use a box plot to
compare mean distance of these photographs from the nearest road. What
does this test?boulder %>%
filter(DB == 'Pano' | DB == 'Flickr') %>%
ggplot(aes(x=DB, y=Street_dis)) +
geom_boxplot()
As you can see there is no significant
relationship. The mean values and standard deviation is highly
similar. There are numerous other tests and charts that you can use to
investigate the relationship between locations of soical media
photographs and other locations in the park.
We are also going to learn about two new packages that might be
helpful for your data science approach. We will learn about the
library(viridis), which provides color palettes that are
interpretable for visually impaired.
The package viridis contains four color scales: “Viridis”, the primary choice, and three alternatives with similar properties, “magma”, “plasma”, and “inferno”.
library(sf)
library(ggspatial)
library(viridis)
## Loading required package: viridisLite
## the function gives the hexadecimal colors
## the interger give the numbers of colors
magma(10)
## [1] "#000004FF" "#180F3EFF" "#451077FF" "#721F81FF" "#9F2F7FFF" "#CD4071FF"
## [7] "#F1605DFF" "#FD9567FF" "#FEC98DFF" "#FCFDBFFF"
boulder <- st_read("/Users/paigelund/Desktop/EAS_548/advanced_geovisualization_week_two/BoulderSocialMedia/BoulderSocialMedia.shp")
## Reading layer `BoulderSocialMedia' from data source
## `/Users/paigelund/Desktop/EAS_548/advanced_geovisualization_week_two/BoulderSocialMedia/BoulderSocialMedia.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 55519 features and 12 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -788775 ymin: 1917813 xmax: -780555 ymax: 1930053
## Projected CRS: NAD_1983_Albers
ggplot() +
geom_sf(data = boulder, aes(color=PT_Elev),
fill = NA, alpha = .2) +
scale_colour_gradientn(colours = magma(10))
We can also plot discrete values.
summary(boulder$DB)
## Length Class Mode
## 55519 character character
p <- ggplot() +
annotation_spatial(boulder) +
layer_spatial(boulder, aes(col = DB))
p + scale_color_brewer(palette = "Dark2")
Alternatively, we can use tmap a way to create maps using R
library(tmap)
## Add the data - these are specific to the vector or raster
tm_shape(boulder) +
## which variable, is there a class interval, palette, and other options
tm_symbols(col='PT_Elev',
style='quantile',
palette = 'YlOrRd',
border.lwd = NA,
size = 0.1)
##
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `symbols()`: instead of `style = "quantile"`, use fill.scale =
## `tm_scale_intervals()`.
## ℹ Migrate the argument(s) 'style', 'palette' (rename to 'values') to
## 'tm_scale_intervals(<HERE>)'
## [v3->v4] `symbols()`: use 'fill' for the fill color of polygons/symbols
## (instead of 'col'), and 'col' for the outlines (instead of 'border.col').
## [cols4all] color palettes: use palettes from the R package cols4all. Run
## `cols4all::c4a_gui()` to explore them. The old palette name "YlOrRd" is named
## "brewer.yl_or_rd"
## Multiple palettes called "yl_or_rd" found: "brewer.yl_or_rd", "matplotlib.yl_or_rd". The first one, "brewer.yl_or_rd", is returned.
It is really easy to add cartographic elements in tmap
## here we are using a simple dataset of the world
# tmap_mode("plot")
data("World")
tm_shape(World) +
tm_polygons("gdp_cap_est", style='quantile', legend.title = "GDP Per Capita Estimate")
##
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_polygons()`: instead of `style = "quantile"`, use fill.scale =
## `tm_scale_intervals()`.
## ℹ Migrate the argument(s) 'style' to 'tm_scale_intervals(<HERE>)'
## [tm_polygons()] Argument `legend.title` unknown.
## [tip] Consider a suitable map projection, e.g. by adding `+ tm_crs("auto")`.
It is really easy to make an interactive map in tmap as well
## the view mode creates an interactive map
tmap_mode("view")
## ℹ tmap mode set to "view".
tm_shape(World) +
tm_polygons("gdp_cap_est", style='quantile', legend.title = "GDP Per Capita Estimate")
##
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_polygons()`: instead of `style = "quantile"`, use fill.scale =
## `tm_scale_intervals()`.
## ℹ Migrate the argument(s) 'style' to 'tm_scale_intervals(<HERE>)'[tm_polygons()] Argument `legend.title` unknown.
In this week’s lab, you will make an open science markdown that documents your process of data analysis and geovisualization. We will be using git to aid in version control for the code. Your assignment is to use Knitr to develop a markdown document that shows your analysis of the Boulder data (you can also use your own data if you wish). Demonstrate how you did your analysis giving step-by-step instructions with the accompanying code.
Discuss the advantages and challenges associated with an open data science approach. Provide an example based on this week’s reading. (1-2 paragraphs)
Create a markdown document that showcases an analysis of this week’s data or any other dataset of your choice. Include descriptive text that explains your analysis, and incorporate figures and geovisualizations.Include 1 chart and 1 map. Structure and explain your analysis with text, headings, highlights, images and other markdown basics.
Bonus: Capture a screenshot of the history of your Git commits. Share your strategy for utilizing Git in your workflow.
Here are the evaluation criteria for the geovisualizations. Questions will be worth 30% of your grade, while the geovisualization and explanation will be worth 70%.
| Evaluation | Highly well-done | Well-done | Some deficiencies | Several deficiencies |
|---|---|---|---|---|
| Cartographic principles - 20% (title, name, date, north arrow, scale, legend, explanation symbols) | Elements present and correctly portrayed (100%) | Most elements present and correctly portrayed (99-80%) | Some elements (when appropriate) present and correctly portrayed (79-50%) | Minimal information (<50%) |
| Presentation and Legibility - 20% (readable, consistency and ease of understanding, flow of ideas consistent with cognition, clear explanation of content) | Highly legible, consistent and easy to understand (100%) | Mostly legible, consistent and easy to understand (99 -80%) | Somewhat legible, consistent and easy to understand (79-50%) | Minimally legible, consistent and poorly understandable (<50%) |
| Content - 20% (relevant, coherent and interesting topic, appropriate subject matter given the presented information/data, free of bias and error ) | Highly relevant coherent, and interesting; consistent information free of bias and error (100%) | Mostly relevant coherent, and interesting; consistent information free of bias and error (99 -80%) | Somewhat relevant coherent, and interesting; some inconsistencies in information(79-50%) | Minimally relevant coherent, and interesting; inconsistencies in information (<50%) |
| Aesthetics - 20% (is the map attractive, are there objective elements that are popularly viewed as beautiful) | Highly attractive/ beautiful (100%) | Mostly attractive/ beautiful (99 -80%) | Somewhat attractive/beautiful (79-50%) | Minimally attractive beautiful (<50%) |
| Creativity and persuasiveness - 20% (imaginative information/data, convincing argumentation, presence of sustainability principles) | Highly imaginative; convincing of sustainability principles (100%) | Mostly imaginative; convincing of sustainability principles (99 -80%) | Somewhat imaginative; less convincing of sustainability principles (79-50%) | Minimally imaginative; not convincing of sustainability principles (<50%) |
It is rather simple to make your html publicly available via github. Here is an example of one I made for a recent paper https://derekvanberkel.github.io/Planning-for-climate-migration-in-Great-Lake-Legacy-Cities/. Below are the step to make the knit html you make for this lab into a static website. Here is another website that give more detail https://blog.flycode.com/how-to-deploy-a-static-website-for-free-using-github-pages
Create a New Repository:
Fill in Repository Information:
Add Your HTML File:
Now, you need to add your HTML file to the repository. You can do
this in several ways: - Use the GitHub web interface to upload your HTML
file. Click on the “Add file” button, then select “Upload files” and
follow the instructions. - If you’re comfortable with Git, you can clone
your repository to your local machine, add your index.html
file to the local folder, and push the changes back to GitHub.
Commit Changes:
After adding your HTML file to the repository, you need to commit the
changes. On the GitHub website: 1. Navigate to the repository. 2. Click
on the “Add file” button and select “Create a new file.” 3. Name the
file index.html and add your HTML code to it. 4. Scroll
down to the “Commit new file” section. 5. Enter a “Commit summary”
(e.g., “Initial commit”). 6. Click the “Commit new file” button.
Configure GitHub Pages:
Once your HTML file is in the repository, go to your repository’s
main page. 1. Click on the “Settings” tab (located towards the right,
under your repository’s name). 2. Scroll down to the “GitHub Pages”
section 3. Navigate to the Pages tab and click it 4. Under
the “Source” section, click the dropdown under “Branch” and select
“main” (or your repository’s default branch). 5. Click the “Save”
button.
Wait for Deployment:
GitHub Pages may take a few minutes to build and deploy your site. Be patient; it usually happens within 10 minutes.
Access Your Live Website:
After GitHub Pages has deployed your site, you’ll find the URL
associated with your website in the “GitHub Pages” section of your
repository’s settings. It should be something like
https://yourusername.github.io/repositoryname.